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(54) Digital signal processing apparatus 

(57) A digital signal processing apparatus which is 
used for the computation of coding image signals or the 
like and a motion compensative operation method which 
uses a digital signal processing apparatus. The appara- 
tus comprises a plurality of signal processing means 
arranged in parallel and control means which assigns 
loads to the signal processing means so that the signal 
processing means have even computation volumes. 
Alternatively, an address generator is provided for each 
of data sets entered independently. An intermediate 
check is conducted during the computation for a block 
which involves a motion compensative operation. 
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Description 

BACKGROUND OF THE INVENTION 
5 Held of the Invention 

The present invention relates to a digital signal processing apparatus which performs computational processes for 
digital signals. 

10 D escription o f the Pr ipr Art 

Fig. 1 shows the multiprocessor system described in article entitled "A Real Time Video Signal Processor Suitable 
for Motion Picture Coding Applications*', IEEE, GLOBCOM '87, p. 453. In Fig. 1 , input data 1 is received by a data transfer 
controller 3, and thereafter data 4 are transferred selectively to digital signal processors 2, i.e. DSP-1 through DSP-N, 
is in block- 1 . After being processed by the respective DSPs in block-1 , resultant data 5 is transferred to block-2 and proc- 
essed by respective DSPs for the next processing step. 

Fig. 2(a) shows divided memory areas of the DSPs. For the simplicity of explanation, shown here is an example of 
parallel processing using three DSPs 2, to which process areas A, 8 and C are assigned evenly. 

In the inter-frame image coding system and the like, it is a general convention to employ the conditional pixel sup- 
20 plementary process in which only portions having at least a certain difference between the input frame and previous 
frame are coded and previous frame data is used for the remaining portions. Accordingly, the volume of computation 
needed for the process differs depending on the valid pixel rate even though the number of pixels in the process area 
is constant. The volume of computation or computation time needed is proportional to the valid pixel rate. 

In the inter-frame image coding system or the like, assuming that the number of valid pixels is shared by all DSPs 
25 to have a distribution EA, EB and EC as shown in Fig. 2(b). the computation time needed for one block of parallel DSP 
configuration is determined from the process time of the DSP which works for the area B with the largest volume of 
process M, and the remaining DSPs which have finished the areas A and C earlier have idle time. 

The conventional digital signal processing apparatus arranged as described above has its overall process time 
determined from the longest process time among DSPs when the density of information, such as the valid pixel rate, 
30 within a frame is uneven and the distribution of information varies with time, resulting in a degraded process efficiency 
per DSP unit. 

Fig. 3 is a diagram showing, as an example, the arrangement of other digital signal processing apparatus disclosed 
in an article entitled "Realtime Video Signal Processor Module", in the proceeding of ICASSP '87. pp. 1961 - 1964, April 
1987, Dallas, U.S.A. In the figure, indicated by 1 is an input terminal, 4 is an input bus for distributing input data on the 

35 input terminal 1, 28a is a feedback bus for distributing the result of previous process, and 20 are signal processing 
modules each including an input storage 21, a processing unit 22, an output storage 23 and a timing control unit 24. 
Indicated by 25 are wired -OR circuits through which feedback data on output ports 30 are placed on the feedback bus 
28a, 26 are wired-OR circuits through which output data on output ports 29 are delivered to the output terminal 5 over 
the output bus 5a, 27 are input ports for the input data to the signal processing module 20, and 28 are input ports for 

40 the feedback data to the signal processing module 20. 

Fig. 4 is a block diagram showing in more detail one of the signal processing module in Fig. 3. In the figure, indicated 
by 221 is an address generator (AGU A), 211 is an input dual memory (MEM A) which receives data on the input port 
27 over the input bus 4, 212 is an input dual memory (MEM B) which receives data on the feedback bus 28a by way of 
the input port 28, 222 is an address generator (AGU B), 223 is an X-bus, 224 is a Y-bus, and 225 is a pipeline arithmetic 

45 unit (PAU) having its input terminal EX1 connected to the X-bus 223 and another input terminal EX2 connected to the 
Y-bus 224. Indicated by 226 is a data memory [MEM P(Q)] having its output connected to the X-bus 223, 227 is an 
address generator [AGU P(Q)] having its output connected to the Y-bus 224 and data memory 226, 228 is a mode 
register (MDR) having its output connected to the X-bus 223 and Y-bus 224, and 241 is a Z-bus connected to the inputs 
of the address generators 221, 222 and 227, pipeline arithmetic unit 225 and data memory 226. Indicated by 242 is a 

so sequencer (SEQ), 243 is an instruction memory (I RAM) connected to the output of the sequencer 242, and 245 is a 
decoder (DEC) connected to the output of the instruction memory 243, with the output of the decoder 245 being con- 
nected to the Z-bus 241 and output bus 231 . The output bus 231 is connected to the input of the mode register 228 and 
the Z-bus 241 . Indicated by 232 is an FIFO memory (MEM C) connected to the output bus 231 , 233 is an FIFO (MEM 
D) connected to the output bus 231 , 29 is an output port of the FIFO memory 232, and 30 is an output port of the FIFO 

55 memory 233. 

Fig. 5 is a diagram showing, as an example, the algorism of a typical high-efficiency coder for a moving image. In 
the figure, indicated by 250 is an input terminal for the input video signal, 251 is an input frame buffer having at least a 
1 -frame capacity and having the simultaneous read-write ability, 252 is an inter-frame subtracter for evaluating the dif- 
ference, 253 is a block identifier, 254 is a coder. 255 is a coding parameter produced by the coder 254, 256 is a variable- 
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dTcSnnected in cascade between the the coding parameter 255. 

blocks 251 - 254 ard 256 - 258_ Further. ^J>y 260* ajc^l deood ^ ^ ^ 

261 is an inter frame aoaer. zo* ■» a ."1 V, foH lrnm the inDUt frame buffer 251 to tne muumi ^"-i 

Ss is a motion compensator. 266 is current ^me datefed fromthe mpj motion compensator 265 to the 

St. 267 is motion vector data. 268 is EZdback signal, and 270 is a coding controller winch 

inter-frame subtracter 252 and ^^^^^^Sl^ to the input frame buffer 251^ 

^e^onof.econ.en.on^W 

each signal processing modu,e 20 oP-tes on ^ a™ 
tofetchaldedframea^^^ 

bus 4 and store the data in the input storage 21 . At the * same tme . J p ^ ^ frame 

for the current process, it operates by expendmg I ° ne ^ !™" e *™ ^ stores ^ data in the input storage 21 
n the feedback data from the input port 28 over **£^^™JZ^ the prescribed signal processing for the 
Upon expiration of one video frame time, the >P™**%*"£ * temporarily in the output storage 23. 
input dataandfeedbackdata stored in ^""^SS!** 30 ** ^ oriizM "" h "T 

circuit 26. delivered to the output terminal 5 over ^^^^ewina modules 20 are combined back to a video 
Divided frame areas processed indrv.sua.ly by 2S?S reason as described above, it is necessary 
frame. Therefore, parallel processing ^^^'^^c^mencement in complete synchronism with one another, 
for all signal processing m^ 
OnthisacoxinUhetimingcontrolu^ 

commencement in synchronism wrthtine ^^^ m ^ 0 ^„ b 'e bri efed in connection with Fig. 4. Among a v.deo 
Next, the operation of one signal processing imodute 20 ^^^^^^sionaJ. dataof theass.gned 
frame entered frame-wise through the ^^^^ 

area is stored in the input dual «. 21 i are stored in the input dual memory 212^ 

the input port 28. the portion of the assigned area and rts per ^eraoaxa same structure on both 

The £ut dual memories 21 1 and 212 is ^^^J^^co^ to the X-bus 223 and Y-bus 
sides and it operates such that while one side « J J^g T^e readme sides of the input dual memories 

dual memories 21 1 and 21 2 by the address 9^°^^^ Jut in accordance with the address of the corn- 
decoder 245 by decoding a 80-brt length ^^^^^^ * e X *us 223 and Y-bus 224 are entered m 
mand memory 243 indicated by the sequencer 242. The sj , processi ng including coding and local 

parallel to the pipeline arithmetic unit 225. which "^^^^ pla P ced on the Z-bus. the coded output -s 

'^eFIFOmemories^ 

which is necessary for the process of * e P'P 6 ^^ values from the decoder 245. 

register 228 consists of a register file .ncludrng ^'^J^S'^^oregoing area division parallel processing, and 
* This digital signal processing apparatus .s pnncjp* ^^™K tans area independently on a realtime 
is intended such that each signal P f ^2^^tZ^CZlm^eni of a coder as shown in Rg. 5. only 
basis. When the digital signal P^e^ng apparatu » '^*™™ 7t transmission butter 258 and coding controller 
portions excluding the variable-length coder 256. video multiplexer za/. 
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2m!^ S T Name,y ' rt iS n0t SUitable for a continuous P r °cess in one video frame, and is limited to the inter- 
'^!^' n9 '°° p process from *• ^ fram « buffer 251 to the block identifier 253. coder STtoSSSr 

260 cod.ng frame memory 263. and to the motion compensator 265 useful for data compietely divisible wTn aS£ 
P r °«ssing module 20 implements the same process for each frame, the processing program 
Tj£ V2n ,T T1! 0ry 243 030 be a Sin9,e P"*"*". a frame is divided into M areas - si i 

greater than or equal to 1). the number of process cycles Nc per pixel which can be dealt with onTrSmrtosJS 
one signal processing module 20 is given by the following calculation. S by 

Nc = Mc-Tf/Mp-Np (clocks/pixel) 

^l« MC ° f maChine CyC,e (Hz) ' is * e frame P eriod < sec )- MP is the number of horizontal pixels in 

the assigned area, and Np is the number of vertical pixels in the assigned area Horizontal pixels in 

On this accountif a frame isdivided into four areas, for example, each having the assignment of a signal orocess.no 
module 20. the number of process cycles Nc is increased by four fold, and it becomes posSf forTe 

proce^n^deTs^nS" 3 ' ^ 38 desCribed ab ° Ve have ** P^lems for 

r a ?r^f„ r l e ^. ChieVemen ! of ver y ^ Processing, a frame must be divided into numerous small areas however 
S?£ T T a,9 ° nSm d06S TOt al,OW dependent processes for areas below a certain ISS 
SKe^ Therefore, realtme processing can not be achieved by increasing the parallelism 

Enn^T*! 3 f ix ®i distribution 01 load to ^gnal processing modules, the process time must be set to meet the 
longest one when each s.gnal processing module has a different process time. Therefore, the system Ss^n™ 
essanly.ncreased parallelism relative to the processing capacity, 
(c) Data input and data processing each take one frame time, and data input and output each need a 1 -frame tauffpr 
memory, resulting in a longer time. aga^ an increased memory capacity. Thereto^ 

d^sfnt ESS? TSSl*" " ,d * fe * ^ 10 ^ oodln » ^ntroBe^ in7g 7fo^ 225? 
so ZZ&Jtt^ 

» J£« G i tl b,ockd | ia 9 ram of the conventional digital signal processing system disclosed in the proceeding (No S10- 
1) of the 1986 annual convention of the communication department of The Institute of Electronics ; «Klc3ijntaft» 
Eng in eers of Japan. In the figure, indicated by 3 1 is a duaJ-port interna, data memory wS^XSSSE 

? ^ simultaneousl ^ 32 is ^ address generator wJich calcufatesTe ad^^SS 
data or wrrte date. 33 is a data bus used for the internal transfer of data related to computation. 34 and 35^ta£« 

SSrS?^ 6 31 ' 36 fe 3 reQiSter *** hMs mutation data sTected by the^lSo^ 3?£ a 

register wrnch holds computation data selected by the selector 35. 38 is a multiplier 39 is a t!T!S£ 11 
o^ofth^ 

44. 41 is a selector which selects the output of the registers 39 or 37. 42 is an arithmetic/loaic unit which n^ri- 
«,mp^tionsformeou to utsofthese.ectors40arKJ41.and43isase.^ 

£ t Zr "I T da1a re9ister 46 ' ^ accumulators 44 are used to hoW nm ^SXiZSSSS 

" c r PU ! ati °" s - ne ^rna. data register 46 is to hold data from an^xternaU SSmSSf^ 

^FJZVZSSl ST" *** ^ ^ " ^ ^™ 

. Ne ^ 1 * h e 0 Pera«on will be described. This signal processing system based on a digital signal processor oerforms 
coiTtfnandfetoh.ngandd^ 

a SE n : proce T g r? 6 - ™ e fo,,owin9 describes •* - 

*> com^nd rSe 09,0 ^ mem ° ries —«*« an? contSed in the micro- 

Arithmetic operations for two inputs, including addition, subtraction, maximum evaluation, minimum evaluation etc 

ss bym?foS^ 
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where i = 1 to N, and ai. bi and ci are sets of independent data stored in the 2P-RAM 31. 

Fig. 7 shows the sequence of process for implementing the 3-input operation of the form of expression (1) by the 
digital signal processing system, for example, shown in Fig. 6. 

The data address generator 32 sets up the starting addresses for two data sets A and B, and selects the simple 
5 incremental mode. Then the two data sets A and B are loaded through the selectors 34 and 35 into the registers 36 and 
37. The selectors 40 and 41 select the registers 36 and 37, respectively, so that the arithmetic/logic unit 42 implements 
the arithmetic operation ai © bi. The selector 43 selects the arithmetic/logic unit 42 to hold the operation result temporarily 
in one of accumulators (ACCO - ACC3) 44, and the resultant data is sent over the data bus 33 and through the external 
register 46 and stored in the external memory 47, which addressing mode is the simple incremental mode because of 
10 it being linked to one of addresses for the 2P-RAM 31 in the address generator 32. 

In the subsequent step ST3, the data address generator 32 sets up the starting addresses of the data set Q and 
data set ai © bi. and ci data is read out of the 2P-RAM 31 to the register 36. The selector 35 selects the data bus to 
load the data of ai © bi in the external memory 47 into the register 37. In this case, in order to have a coincident timing 
of reading for the data set £ and data set ai © bi, step ST4 needs to expend two cycles of useless command reading 
is for the external memory in advance. 

The two sets of data are rendered multiplication by the multiplier 38 in step ST5, and the result is stored in the 
register 39. In the next cycle, the resultant data is passed through the arithmetic/logic unit 42 and, after being held 
temporarily in one of the accumulators (ACCO - ACC3) 44, transferred over the data bus 33 to the 2P-RAM 31 . 

These operations are carried out in parallel on the basis of the pipeline process, and the operations from the reading 
20 of 2P-RAM 31 until the storing of the process result in the external memory 47 for N pieces of data sets will take N + 3 
machine cycles in the case of an arithmetic operation. 

The steps of operations are listed in the following Table 1 and Table 2. Table 1 is for the operation of ai © bi and the 
transfer of the result to the external memory 47, and Table 2 is for the reading the resultant ai © bi from the external 
memory 47, the operation of (ai © bi) • ci, and the transfer of the result to the 2P-RAM. In both tables, symbol V repre- 
25 sents an indefinite value. Storing in the external data register 46 completes in machine cycle N + 3 in both tables, and 
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the external data register 46 is read uselessly in machine cycle 0 (two machine cycles) in Table 2. 
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Next, after two useless reading cycles of the external memory 47 for timing purposes, multiplication is carried out 
for N pieces of data sets and the results are stored in the 2P-RAM 31. These operations take N + 3 machine cycles, 
which are added by two command cycles for address initialization, and a total of 2N + 10 cycles are expended. An 
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operation of expression (2) also takes 2N + 1 0 cycles. Accordingly, it will be appreciated that if a 3-input-1 -output operation 
is conducted for N pieces data sets using a processor with the ability of 2 -input operation at most it will take about 2N 
machine cycles (provided that N is sufficiently large). 

The following describes the cumulative operation for the results of the foregoing 3-input-1 -output computation. 

5 N 

S = £(ai ©bi)xci (3) 

(=1 


10 


N 

S = £(aixbi) e ci (4) 

In the case of expression (3). the multiplication result for ai © bi and ci (output of register 39) and the intermediate 
cumulative value are entered to the arithmetic/logic unit 42. and the result of summation is entered back to the same 
accumulator 44 through the selector 43. Thereby, the process takes 2N + 10 cycles unchanged. 

is In the case of expression (4), the data sets (ai x bi) <B ci which have been stored temporarily in the 2P-RAM 31 are 
read out sequentially and summed by the arithmetic/logic unit 42, and therefore the process needs another N cycles, 
resulting in a total of 3N + 10 cycles. 

The conventional digital signal processing system is formed as described above, and therefore for a 3-input-1 -output 
operation of three independent data sets, it performs two times of 2-inpuM -output operation. In addition, the process 

so time is further extended for address control, memory transfer and other processes. 

Fig. 8 is a diagram showing in brief the image coding transmitter which implements the conventional motion com- 
pensatory operation method disclosed in an article entitled "Dynamic Multistage Vector Quantization for Images", journal 
of The Institute of Electronics and Communication Engineers of Japan, Vol. J68-B, No. 1 , pp. 68 - 76, Jan. 1985. In the 
figure, indicated by 1 is an input signal of image data formed of a plurality of consecutive frames on the time axis, 52 is 

25 a motion compensator which produces a prediction signal on the basis of the resemblance computation of con-elation 
between the current frame represented by the input signal 1 and the previous frame represented by a previous frame 
signal 53 which is the previous reduced signal 1 , 54 is motion vector information provided by the motion compensator 
52 indicative of the position of a prediction signal block, 55 is a prediction signal produced by the motion compensator 
52, 56 is a coder which codes the difference between the input signal 1 and prediction signal 56, 57 is a decoder which 

30 decodes the signal coded by the coder 56, and 58 is a frame memory which stores data reproduced through the sum- 
mation of the signal from the decoder 57 and the signal from the motion compensator 52. 

The performance of the foregoing arrangement will be described in connection with Fig. 9. The motion compensation 
process is to calculate for the input signal 1 the amount of distortion between a 11 -by- 12 block located in a specific 
position in the current frame shown in Fig. 9(A) and M pieces of blocks in the search range S in the previous frame 

35 shown in Fig. 9(B) to evaluate the position of the block y providing a minimal distortion relative to the position of the input 
block, i.e.. motion vector V, and to recognize the signal of the minimal distortion block as a prediction signal. 

The number of motion vectors V under search within the search range S in the given frame is assumed to be M (an 
integer greater than 1 ). The amount of distortion of the position of a specific motion vector V between the previous frame 
blocks and the current input block is calculated as a sum of absolute values of differences as follows. 

40 K 

di=£|yih-xh| (5) 

h=l 

where input vectors x = {x1, x2 xk} , search object blocks yi = {yi1, yi2, .... yik} , i *= 1, 2, .... M , and M and K are 

fixed values. The motion vector V is evaluated as follows. 


45 


V = VI {min di | i = 1, 2, .... M} (6) 


Fig. 1 0 shows the sequence of operations for detecting the motion vector V. Step ST1 1 calculates a distortion di 
at each of K pieces of sampling points on the basis of expression (5), and the next step ST12 compares the di with the 

so minimal distortion D at position I, and, if di < D, the variables are replaced to be D = di and I = i . These operations are 
repeated for the number of search vectors, i.e., the operational process of expression (6), to determine the final minimal 
distortion D and its position I. 

These operations must be completed within the period of each frame entered successively, and therefore a high- 
speed digital signal processor is required. 

55 As an example, the digital signal processing system shown in Fig. 6 is used to carry out the motion compensation 
process. In this case, the multiplication-sum operation takes place KxM times for each input block, and the number of 
machine cycles is the total time expended by M times of processes including comparison and updating. Generally, the 
number of cycles for comparison and updating is small enough as compared with that of the multiplication-sum operation, 
and the volume of motion compensation operation for one block is virtually equal to K x M machine cycles. 
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However, since these operations are determined from the time corresponding to the period of frames entered suc- 
cessively, parallel processing will be needed for the mass multiplication-sum operations to be performed in a short time, 
depending on the operation process cycle time of a particular digital signal processor. 

The conventional motion compensation scheme is implemented as described above, and in order to ensure the 
5 operation time for an enormous volume of operations when carried out using a digital signal processor, the processor 
needs to have parallel processings, resulting in an increased complexity and scale of hardware structure. 

SUMMARY OF THE INVENTION 

10 The present invention is intended to overcome the foregoing prior art deficiencies, and a prime object to provide a 
digital signal processing apparatus which uses the multiprocessor parallel configuration to its maximal processing ability. 

Another object of this invention is to provide a digital signal processing apparatus which works efficiently with less 
number of processors and less capacity of memory, while ensuring the latitude of signal processing algorism. 

Stilf another object of this invention is to provide a digital signal processing apparatus which eliminates the need of 
75 address control for storing the intermediate result and transfer to the memory, thereby executing fast 3-input-1 -output 
operation. 

A further object of this invention is to provide a motion compensative operation method which, in constructing the 
motion compensator of an image coding system with a digital signal processing apparatus, requires less number of 
parallel processors, thereby enhancing the simplicity and compactness of the hardware structure. 

20 In order to achieve the above objectives, the inventive digital signal processing apparatus comprises a plurality of 
processors and a task controller which issues address control signals of task assignment to the processors so that they 
fetch an even quantity of significant information to their memories for processing. 

Furthermore, the inventive digital signal processing apparatus comprise a plurality of signal processors connected 
with one another through a dual-port memory arranged in series and/or parallel, a shared memory which can be accessed 

25 for reading and writing in blocks of signal processing or in arbitrary number of data from all of the signal processors, a 
task table which stores the process status in the signal processors, and a data flow controller which scans the contents 
of the task table at a certain interval, determines a charged process of each signal processor on the basis of feedback 
data including the occupancy rate of buffer memory reported by the output controller, and directs the interrupt controller 
of each signal processor to start 

30 Furthermore, the inventive digital signal processing apparatus comprises a first through third data reading address 
generators adapted to read three independent data sets independently and simultaneously, and a p^r of arithmetic unit 
and multiplier adapted to execute a 3-input-1 -output arithmetic operation at high speed by receiving the output of coun- 
terpart mutually. 

The inventive motion compensation method using a digital signal processing apparatus divides a current input frame 
35 of digital image data, which consists of a plurality of frames, entered successively into a plurality of blocks, searches the 
previous input frame for a pattern which resembles the block of the input frame, and implements a coding process with 
a block of minimal distortion in highest resemblance as a prediction signal, wherein in detecting a block of minimal 
distortion through the computation of inter-pattern resemblance using the difference and cumulation of pixels in each 
block between the block of the current input frame and blocks of M in number (M is a positive integer) in the previous 
40 input frame, the method uses, for the pattern resemblance computation, a maximum of K pieces of pixels (K is an integer 
greater than 0 and less than or equal to a total number of pixels in a block), implements an intermediate check n times 
(n is an integer greater than 0) during the computation of resemblance at time points when the number of reference 
pixels is smaller than K, skips the computation for remaining pixels when a cumulative value at each time point of inter- 
mediate check is greater than a threshold value which is set for each time point of intermediate check and excludes the 
45 block from the range of comparison for finding a minimal distortion block, and detects by computation the minimal dis- 
tortion block among blocks in which cumulative values are below the thresholds at all time points of intermediate check. 

BRIEF DESCRIPTION OF THE DRAWINGS 

so Fig. 1 is a block diagram showing the multiprocessor system of a conventional digital signal processing apparatus; 

Fig. 2 is a diagram explaining the assigned areas of the processors shown in Rg. 1 ; 

Fig. 3 is a block diagram showing the arrangement of other conventional digital signal processing apparatus; 

Fig. 4 is a block diagram showing in detail the arrangement of the signal processing module shown in Rg. 3; 

Rg. 5 is a block diagram showing the algorism of the high-efficiency coder for a moving image; 
55 Fig. 6 is a block diagram showing the arrangement of a third conventional digital signal processing apparatus; 

Rg. 7 is a flowchart showing the process of 3-input arithmetic operation using the digital signal processing apparatus 

shown in Rg. 6; 

Rg. 8 is a block diagram showing in brief the arrangement of the image coding transmitter which carries out the 
conventional motion compensative operation method; 
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Fig. 9 is a diagram used to explain the conventional motion compensative operation method; 

Fig. 10 is a flowchart showing the operational process for detecting a motion vector in the conventional motion 

compensative operation method; 

Fig. 1 1 is a block diagram showing the digital signal processing apparatus based on the first embodiment of this 
5 invention; 

Fig. 12 is a diagram explaining the area assignment for the processors shown in Fig. 1 1 ; 

Fig. 1 3 is a block diagram showing the arrangement of the digital signal processing apparatus formed by connecting 
in cascades a plurality of digital signal processors (DSP blocks) shown in Fig. 1 1 ; 
Fig. 14 is a diagram showing the concept of process of each DSP block shown in Fig. 13; 
w Fig. 1 5 is a block diagram showing the digital signal processing apparatus based on the second embodiment of this 
invention; 

Fig. 16 is a block diagram showing the internal arrangement of the signal processor shown in Fig. 15; 

Fig. 1 7 is a diagram explaining the concept of control operation of the digital signal processing apparatus shown in 

Fig. 15; 

is Fig. 1 8 is a diagram explaining the relation between parameter data and processing block data in the digital signal 
processing apparatus shown in Fig. 15; 

Fig. 19 is a diagram showing the correspondence between data blocks and a frame; 

Fig. 20 is a block diagram of the arrangement in which a plurality of digital signal processors are included in the 
digital signal processing apparatus shown in Fig. 15; 
20 Fig. 21 is a block diagram showing the digital signal processing apparatus based on another embodiment of this 
invention; 

Fig. 22 is a flowchart showing the operational process of the digital signal processing apparatus shown in Fig. 21 ; 
Fig. 23 is a flowchart showing an embodiment of the inventive motion compensative operation method using a digital 
signal processing apparatus; 

25 Fig. 24 is a diagram used to explain the method of intermediate checkfor the computation of distortion in the inventive 
motion compensative operation method; and 

Fig. 25 is a diagram showing the arrangement of pixel samples at sampling points in a block according to the inter- 
mediate check method for the distortion computation. 

so DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

Specific embodiments of the present invention will now be described with reference to the drawings. 
Fig. 1 1 shows, as an embodiment of this invention, an example of the image coder of the digital signal processing 
apparatus. In the figure, input data 1 is entered to a first through third input memories 6. A task controller 7 estimates 

35 the number of valid pixels on the basis of the contents of the input memory 6, determines the distribution of coding 
process among a first, second and third DSPs 2, and issues control signals as address control signals 8 to the DSPs 
2. Upon receiving the address control signals 8, the first second, and third DSPs 2 issue addresses 9 to respective first, 
second and third input memories 6 to fetch data 10 assigned for processing, and implement the coding processes based 
on the preset program. Upon completion of processes, the first, second and third DSPs 2 store processed data in an 

40 output memory 1 1 , which, after reading the whole data of the DSP block, sents the processed data to the next DSP block. 
In this case, each DSP 2 is controlled by the task controller 7 so that all DSP 2 have even numbers of valid pixels 
assigned, and therefore the image coding process time is controlled so that the difference of process times among the 
DSPs 2 is minimal. Namely, in case of coding an image with numbers of valid pixels as shown in Fig. 12(b), an area A 
having a relatively small number of valid pixels is enlarged to A', an area C having a relatively large number of valid 

45 pixels is also enlarged to C, and an area B having a larger number of valid pixels is reduced to B\ as shown in Fig. 
1 2(a), by the task controller 7. The task controller 7 issues the address control signals 8 corresponding to the assignment 
distribution to the first second and third DSPs 2. 

For example, in response to the issuance of the address control signal 8 for coding the image data of area A to the 
first DSP 2, it produces the address 9 for the area A' in the first input memory 6 to fetch data and implements the image 

so coding process by following the prescribed program. Similarly, the second and third DSPs 2 are directed to carry out 
the image coding processes for the areas B' and C\ respectively. Consequently, the first second and third DSPs 2 have 
their numbers of valid pixels EA\ EB' and EC* for coding virtually made even, i.e., the same quantity of image data to 
be processed, as shown in Fig. 12(b). As a result the maximum volume of process M* dealt with by the inventive apparatus 
becomes sufficiently less than that M of the conventional apparatus, and the process time required for each DSP block 

55 is reduced. 

Fig. 13 shows the inter-frame coder constructed by a serial connection of DSP blocks in three stages. Each DSP 
block performs the process shown in Fig. 1 4. The first DSP block 12 enters upon the input data 1 and. after producing 
a differential signal, implements the valid/invalid judgment, evaluates the distribution of the numbers of valid pixels in 
the image data, and sends the information to the task controller 7. Based on the information, the task controller 7 issues 
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address control signals 8 for dictating such address adjustment that the DSPs in the second DSP block 13 have even 
assignments of data. Each DSP in the second DSP block 13 implements the process by adjusting the read address as 
described above. The third DSP block 14 is designed to operate identically. 

Although in the foregoing err^wfi merit the DSP process assignment areas are controlled on the basis of the valid 

5 pixel distribution among areas in image data, the present invention is not confined to this scheme, but feedback DSP 
assignment control based on the general quantity distribution of transmitted information is also possible, for example. 

A second embodiment of this invention will be explained with reference to the drawings. Fig. 15 shows an example 
of the configuration of a digital signal processing apparatus, the second embodiment of this invention. In the figure, 301 
is a data flow control section (D F C) working as a control means; 302 are control parameter data output from the data 

io flow control section 301 ; 303 is a common memory (C M) which stores feedback data, a large capacity data and table, 
etc.; 304 is a task table (T B) which stores a processing status of each signal processor element (P E) 318; 305 is a 
common bus (C-BUS) which has the function as a status communicating means consisting of at least a bus connected 
to the common memory 303, the task table 304 and each signal processor element 318; 306 is a video frame synchro- 
nizing signal (F p) which discriminates the starting point of a video frame to be supplied to the data flow control section 

is 301 in the case of inputting video signals etc.; 307 are feedback data (F b) which inform the data flow control section 
301 of the occupying status, data quantity of a sending buffer etc. and finishing of one frame data processing etc. output 
from an output control section 308 described later; 308 is an output control section (O C) provided with a buffer memory 
for outputting data at a certain constant speed in restructuring processed blocks output from a plurality of signal processor 
elements (P E) 318 for example in the scanning order in a video frame; 309 is an input terminal of analog signals; 310 

20 is an A/D converter; 31 1 are digitized input data; 312 is a parameter memory (P M) consisting of dual port memories; 
31 3 is an input frame buffer consisting of dual port memories for functioning as a block formation means by memorizing 
input data 31 1 temporarily; 314 is a bus connecting the parameter memory 312 to the signal processor elements 318; 
315 is a bus connecting the input frame buffer 313 to the signal processor elements 318 in order to supply data in a 
block unit; 316 is a common bus input/output port connected to the common bus 305; 31 7 is an interruption control port 

25 for sending/receiving timing control signals from the data flow control section 301 ; 318 are individual signal processor 
elements (P E) and these signal processor elements are provided with software which functions as a starting means, 
and these signal processor elements are mutually connected with buses 314 and 315, and said last stage signal proc- 
essor element 318 and the output control section 308 are also connected with buses 314 and 315; 319 is an output 
terminal through which data are output at a certain constant speed and timing from the output control section 308; 320 

30 is a multiprocessor module comprising the parameter memory 312, the input frame buffer 313 and a plurality of signal 
processors 318 connected in series through the buses 314 and 315, for example. 

The data flow control section 30 1 has a judgment means which scans the task table 304 at a certain constant cycle 
and judges the processing conditions of individual signal processor elements 318. Trie data flow control section 301 
also has a control means which based on the result of the judgment means it decides if each signal processing module 

35 can process the next signal process block and when the processing is found to be possible it makes process start by 
sending out an interruption signal to the interruption control port 317 and when the processing is found to be impossible 
it instructs the transfer of the signal process block to another signal processing module which can process the block. 
When a parallel processing of a constant cycle, in which the task table 304 is scanned, is to be done the scanning period 
shall be the number of parallelness times of the input cycle of the signal process block, and when a series processing 

40 is to be done the scanning period shall be 1/n of the input cycle; thus by the synchronization with the input data frame 
(for example a video frame) the matching with the real time can be maintained. 

Fig. 16 shows an example of the internal constitution of the signal processor elements 318 as shown in Fig. 15. In 
the figure, 330 is a terminal to which the common bus input/output port 316 is to be connected; 331 is a terminal to 
which the interruption control port 317 is to be connected; 332 is a terminal to which the buses 314 and 315 are to be 

45 connected; 333 is similarly a terminal which connects the buses 314 and 315 between the adjacent signal processors; 
334 is an externa) bus control section (BUS-CONT) with the function as a competitive control means to control the 
make/break of the common bus 305 through the bus 316; 335 is a bus for loading a writable control storage (W C S) 
336, which memorizes a signal processing program, from the external bus control section 334 at an initial time; 337 is 
a BUSREQ which requires the connection of the common bus 305 to the external bus control section 334; 338 is a 

so BUSACK which denotes the permission for the BUSREQ 337; 339 are command codes which are successively read 
out from the writable control storage 336 according to the signal processing program; 340 is a digital signal processor 
(D S P) which execute data processing; 341 is an INTACK which informs an interruption control section (INTER-CO NT) 
345 of the reception of an interruption from the digital signal processor 340; 342 is, on the contrary to it an INTREQ 
which informs the digital signal processor 340 of the requirement of an interruption; 343 is a bus to connect an internal 

55 bus 344 to the common bus 305 through the external bus control section 334, and the internal bus 344 is directly con- 
nected to the digital signal processor 340; 345 is an interruption control section (INTR-CONT) which processes an 
interruption signal from the data flow control section 301 ; 346 is a bus which writes the parameter of a processed data 
block on a dual port memory 349 through the internal bus 344; 347 is similarly a bus which writes processed block on 
the dual port memory 349; 348 is a bus which connects a work memory in the dual port memory 349 and the internal 
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bus 344; 349 is a dual port memory provided with a parameter memory, data memory and work memory which outputs 
data to the adjacent signal processor element 318 through the terminal 333 and buses 314 and 315. 

Fig. 17 explains the internal control operation of the digital signal processing apparatus shown in Fig. 15, and the 
same parts as those shown in Fig. 15 are given the same symbols; the explanation of them is therefore omitted. 

s In the figure, 351 is a block which shows analytical operation of a parameter inside the signal processor element 

318; 352, 353. 354 are blocks which show the operation of individual signal processing subroutines A, B and C according 
to the parameter of each of them; 355 is a block which shows the contents of a parameter memorized in the dual port 
memory 349; 356 is a block which shows the contents D of processed block data memorized in the dual port memory 349. 
Fig. 18 explains an example of the relation between the parameter data and process block data until a data block 

10 is successively given a series of function processes and an output result is obtained through series and parallel processes 
of block units executed in the digital signal processing apparatus shown in Fig. 15. In the figure, 360 is a block address 
(B A D) showing the position of an input block in a frame; 361 is a processing number (PIM) showing the kind of a process 
to be given to said block; 362 is a flag (PFLG) which discriminates the result of the process; 363 is a data block in which 
for example eight subblocks are combined to form a block. 

is Fig. 19 shows an example of correspondence between the data block 363 shown in Fig. 18 and one video frame 
when a picture coding process is performed in this system. In the figure, 365 is one video frame; 366 is a data block 
when a picture is divided into 1 6 lines x 1 6 pixels; 367 is a subblock which is obtained when the block is further divided 
into 8 blocks of 4 lines x 4 pixels. 

An explanation of the operation based on Fig. 1 5 is given in the following. Input data 311 digitized by an A/D converter 

20 310 are memorized in an input frame buffer 313 being scanned in a raster form in synchronization with a video frame 
synchronizing signal 6, for example. Input data 3 1 1 memorized in the input frame buffer 313 are added to initial parameter 
data 302 by the data flow control section 301 by blocks and the parameter data 302 are memorized in the parameter 
memory 312. These parameter memory 312 and input frame buffer 313 consist of dual port memories and writing/reading 
is simultaneously possible between two independent ports. 

25 Data blocks are read from the input frame buffer 313, and the parameter is read in a data block unit from the parameter 
memory 312. Data blocks and parameter are sent through the buses 314 and 31 5 to the signal processor 313 element 
where they are given the first process of a series of functional processes in a block unit. Next the results and the rewritten 
parameters are written in the dual port memory 349 in the signal processor element 318. It is the basic function of a 
processor module 320 to execute processes successively between the adjacent signal processor elements 318 and to 

30 execute a pipeline processing for each Mock unit. 

When a processing is executed for each block unit, if a feedback data such as coded previous frame data are to be 
referred to, feedback data are input to the common memory 303 connected to the common bus 305 and memorized. 
The process of a new video frame is performed by such processing that the other signal processor 318 than the one 
which data have written through common bus 305 refers the common memory 303. If the writing of the feedback data 

35 of the previous frame is not completed in the proper position in the common memory 303, the execution time of the 
process shall be specified. 

When the processing of a unit (block processing) is finished, each signal processor element 318 memorizes the 
status showing the completion of the present processing in the task table 304, and wait the next processing. The data 
flow control section 301 scans the task table 304 and when the processing of the former stage signal processor element 
40 31 8 is completed, it sends out an interruption signal to said signal processor element 318 and start the next processing. 
By repeating the operation, the execution of the operation control of each signal processor element 318 is performed. 

To conduct parallel processing in a block unit for each processor module 320, the data processing condition in the 
input frame buffer 313 of each processor module 320 is detected with the status information of the initial stage signal 
processor 318 and individual block data are distributed by proper load distribution and input to each multi-processor 
45 module 320. 

These results are shown by the control parameter data of the initial stage and the signal processor element 318 
discriminates the processing for the block by deciphering the above results and executes a proper processing. Among 
these processings there are for example functional processors such as a block identifier 253, a coder 254, a local decoder 
260, an inter-frame subtracter 252, a motion compensator 265, an inter-frame adder 261 , a variable length coder 256, 
so and besides them a processing which performs only load distribution such as a processing of transferring block data is 
included. 

In the data flow control section 301 , it is possible to make an arbitrary signal processor 31 8 undertake an arbitrary 
processing by controlling the first stage parameter; owing to such performance as mentioned above the load can be so 
distributed to signal processor elements 318 as to make them work efficiently as much as possible. 
55 Th output control section 308 reconstitutes processed blocks which are output at random times into for example 
a scanning order of an input video frame and produces a resultant output for an output terminal 319 and also produces 
feedback data 307 to inform the data flow control section 301 of these data. 
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The output control section 308 takes charge for example of a video multiplex section 257 and a transmitting buffer 
258 shown in Fig. 5. and it outputs a feedback signal 269 from the transmitting buffer 258 to a coding control section 
270 which takes charge of the data flow control section shown in Fig. 15. 

The data flow control section 301 takes charge of the functions of above-mentioned load distribution and the coding 
s control section 270 as shown in Fig. 5, and finds the block identification control signal 273 and coding control signal 274 
and multiplex them in the control parameter data for the execution of the whole characteristic control. Refer to Fig. 16; 
the processing of a single signal processor element 318 is started by the interruption from the data flow control section 
301, and the contents of the parameter memory 312 is input to it through an internal bus 344. On the basis of the 
discrimination result of the contents, the processing of one unit of block data is performed by a digital signal processor 
io 340. 

The result and rewritten parameters are written in a dual port memory 349, and the status is set in the task table 
304 through an external bus control section 334; thus the preparation for the next process is ready. An interruption control 
section 345 interfaces the interruption from the data flow control section 301 with the digital signal processor 340. The 
parameter and the data written in the dual port memory 349 are read by an adjacent signal processor element 3 1 8 which 

is is connected to a terminal 333, and the next stage process is given. 

Fig. 1 7 shows the flow of these processes performed by the data flow control section 301 , and it shows the relation 
between the control of writing/referring of feedback data to the common memory 303 and the control of status writing 
in the task table 304 by the data flow control section 301 through the common bus 305, and the start processing control 
in the signal processor element 318 by a parameter analyzer 351. 

20 Fig. 18 shows the rewriting of the contents of control parameter data 302, which are added corresponding to an 
input block data 363, and the flow of these processes. A block address which shows for example the position in a frame 
or time sequential order of a block, and a flag 362 which is referred to on the kind of the next process and the contents 
of the next process are contained in the control parameter data 302. The block address 360 is used for the discrimination 
of a special process in a certain case for example with an end point in a picture or for the restructure of data in the output 

25 control section 308 when a process is finished. The flag 362 shows for example the results etc. of coding control infor- 
mation 271, a block identification control signal 273, coding control signal 274, and a block identifier 253 as shown in 
Fig. 5. Input block data 363 are set to have the minimum size handled in a unit processing. The motion compensator 
265 shown in Fig. 5 has a block of 16 x 16 size and after the block identifier 253 blocks of 4 x 4 sizes are handled. In 
such a case as mentioned above where a block size differs for each unit processing, block sizes are arranged to have 

30 matching between a maximum block size and a subblock size contained in it In this case, eight pieces of 4 x 4 blocks 
are combined to constitute a 16 x 16 block When coding of a picture is performed, this block corresponds to a c:nall 
picture element made by dividing an ordinary one frame into small square picture elements. 

Fig. 19 shows an example where one video frame 365 is divided into a block 366 and subblocks 367. 

In the above embodiment, a signal processor element 318, which has a single digital signal processor 340, is shown 

35 but when a higher speed processing is preferable a hierarchical structure combined with a plural number of digital signal 
processors can be used. The constitution of the signal processor element 318 in the case of the hierarchical structure 
is shown in Fig. 20. In this case, as the load for the data flow control section 301 increases a local data flow control 
section 370, a local common memory 371, and a local task table 372 are provided inside the signal processor 318 in 
order to locally execute the optimum load distribution inside the signal processor. The data flow of the digital signal 

40 processor 340 which is connected to a local common bus 373 is the same as that shown in Fig. 15 except that the 
operation is executed inside the signal processor 318. 

In the above embodiment, a series^arallel structure is adopted but in some case a complete parallel or complete 
series structure is effective according to the purpose of a signal processing and a real time processing could be possible. 
The other embodiment of this invention is explained with reference to Fig. 21. In Fig. 21, 420, 421 and 422 are 

45 address generators for readout data; 423 is an address generator for writing data; 424, 425 and 426 are data memories, 
and address data generated by the address generator 423 are input to these memories; 427, 428 and 429 are data 
buses which transfer readout data from the data memories 424, 425 and 426; 430, 431 and 432 are registers for holding 
data transferred from data buses 427, 428 and 429; 433 is a register to hold the output of the register 432; 434 is a 
selector to select the output of the register 430 or that of the register 433; 435 is a selector to select the output of the 

so register 431 or that of the register 441 ; the selector 434 and the selector 435 constitute a first selector group; 436 is a 
selector to select the output of the register 430 or the output of a register 439; 437 is a selector to select the output of 
the register 431 or the output of the register 433; the selector 436 and the selector 437 constitute a second selector 
group; 438 is an operator which operates by inputting the output of the selectors 434 and 435; 440 is a multiplier which 
performs multiplication by inputting the output of selectors 436 and 437; the register 439 is the one to hold the output 

55 of the operator 438; a register 441 is the one to hold the output of the multiplier 440; 442 is a selector which selects the 
input from the register 439 or the input from the register 441 and outputs it; 443 is an adder which adds the output of 
the output selector 442 and the output of an accumulator 444 and outputs to said accumulator 444; 445 is a data bus 
to transfer output data of the accumulator 444 and the output selector 442; 446 is an interface circuit which performs 
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outputting/inputting of data to/from external circuits; 451 - 453, 461 - 463. 471 - 473 denote signal lines which output 
the output of data memories 424, 425 and 426 to data buses 427, 428 and 429. 

The following are the explanation of operation. In Fig. 21, assume that data series with N elements, 
A = (ai|i = 1 to N) , B = (bi|i = 1 to N) , C = (ci|i = 1 to N) are previously stored respectively in the data memory 424, 
s data memory 425, and data memory 426. 

Under the conditions above, the operation when the operation of three inputs and one output is performed is shown 
below. The operation processing flow is shown in Fig. 22. 

To begin with, at a step ST31 , top addresses of three series of input data and of an output result storing memory 
are initially set by address generators 420, 421 and 422. After that the address generators are assumed to take simple 
10 increment actions. 

The data memory 424 corresponds to the address generator 420; the data memory 425 corresponds to the address 
generator 421; the data memory 426 corresponds to the address generator 422. Individual data memories 424, 425 
and 426 readout data based on the addresses of address generators 420, 421 and 422. 

Data are input to three data buses 427, 428 and 429 (X-BUS, Y-BUS, Z-BUS) respectively from data memories 424, 
is 425 and 426, so that for the outputting of each of these data memories 424, 425 and 426 to a specified data bus, only 
one bus out of three is controlled to be effective, and the other two are controlled to be in the state of a high impedance. 
In this case, the output of data buses is limited to that of the one which is made to be effective. For example, when A 
data series is to be input to the register 430, the A series data are output to the signal line 451 , and the signal lines 461 
and 471, which output data from other data memories 425 and 426 to the data bus 427, are in the state of a high 
20 impedance. The same thing goes lor other data buses. 

Each of these data series are set respectively in the registers 430, 431 and 432. Three data buses 427, 428 and 
429 can select data from three data memories 424, 425 and 426, so that 3 3 kinds of data set combinations can be 
supplied to the registers 430, 431 and 432. 

Two expressions as shown below are defined in the way of three input operation and then the processing method 
25 is shown in the following: 

(ai © bi) x ci (7) 
(ai x bt) e ci (8) 

30 

where (x$y) expresses an arithmetic and logic operation for finding results or values of addition, subtraction, maximum 
values or minimum values for two input data x, y, and (x x y) expresses multiplication. The explanation of operation 
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processing flew of the expression (7) is given in the Table 3. The mark of "X" in the table represents an unknown. 
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At a step ST32 a selector 434 selects the side of a register 430 and a selector 431 selects th side of a register 
435. By the use of these two selected data (ai and bi) the operation (ai © bi) is performed with an operator 438. and the 
result is stored in a register 439. This value is output from the register 439 in the next step. 
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The data ci in the register 432 are delayed by the register 433 by one step. In the next step a selector 436 selects 
the side of the register 439 and a selector 437 selects the side of a register 433. By the use of these two data, (ai (B bi) 
is multiplied by ci with the multiplier 440 and the result (ai © bi) x ci is stored in a register 441 . This value is output from 
the register 441 in the next step. By an output selector 442's selecting the register 441 , the data (ai e bi) x ci are sent 
5 to one of the data memories 424, 425 and 426 through a data bus 445 based on the address shown by the address 
generator 423. 

In this invention, readout of data, execution of operation and writing of data are continuously executed by a pipeline 
processing, so that the control of each section can be operated in parallel. Therefore if the three input one output operation 
is executed for a data series with N elements, from the time when the first datum is readout until the time when the 
10 processing result of the last datum is written into a memory the period of (N + 3) cycles are required. 

The explanation of operation processing flow of expression (8) is given in Table 4. The mark V in Table 4 represents 
an unknown. 
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The operation in which three input data are readout to registers 430, 431 and 432 is the same as that in the case 
of expression (7). When the operation of expression (8) is executed, the selector 436 selects the side of the register 430 
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and the selector 437 selects the side of the register 431 . and the operation (ai x bi) is performed by the multiplier 440 
and the result is set in the register 441 . 

In the next step, the selector 434 selects the side of the register 433 and the selector 435 selects the side of the 
register 441 , and the operation (ai x bi) © ci is executed by the operator 438 and the result is set in the register 439. 
5 In the next step, by the selector 442*s selecting the side of the register 439 the selection result is written into one of the 
data memories 424 to 426. 

Thus the case of the operation of expression (8) is the same as the case of expression (7), thereby the total process- 
ing time requires (N + 3) cycles. 

In the case of the operation of two input one output, the value of (ai © bi) can be obtained through the procedure 

10 as shown in the following: the selector 434 selects the side of the register 430 and the selector 435 selects the side of 
the register 431 and after the operation is executed by the operator 438 the side of the register 439 is selected by the 
selector 442 in the next step. The value of (ai x bi) can be obtained through the procedure as shown in the following: 
the selector 436 selects the side of the register 430 and the selector 437 selects the side of the register 431 , and after 
the execution of the operation with the multiplier 440 the selector 442 selects the side of the register 441 in the next step. 

is The processing speed in the case of three input one output is (2N + 10/N + 7 ) times of that of prior art, that is almost 
half times if N is a large number. 

When a cumulative value is to be found in the three input one output operation, a cumulative value till a point on the 
way or an initial value is stored in the accumulator 444 and each one of the successive operation results is added to the 
cumulative value in the accumulator 444 with the adder 443 and the added result is stored in the accumulator 444 again. 

20 These processes are performed repeatedly. Processing cycles therefore are not increased due to cumulative operation. 
Fig. 23 shows a flow chart to realize a method for motion compensative operation which refers to an embodiment 
of this invention. Fig. 24 is a drawing for the explanation of an intermediate check method in the distortion quantity 
operation in this invention. Fig. 25 is a disposition drawing of a pixel sample at a sample point in a block in the intermediate 
check method for distortion operation in this invention. 

25 Before the operation process, on the first block among M pieces of candidate blocks for search in the previous frame 
data, distortion quantity of all the pixels in the block shall be measured; the distortion quantity in this case is defined to 
be the minimum distortion. As for the distortion quantity, differential absolute value sum is adopted. In the distortion 
quantity operation about on and after the second block the calculation of differential absolute values of all pixels is not 
needed, but at an intermediate check point if the intermediate distortion quantity exceeds a certain value, it is judged 

30 that the ultimate distortion quantity of the block cannot be smaller than the minimum distortion D and the distortion 
quantity operation for the residual part is stopped. . ^. 

A block which gives the minimum distortion is detected by the calculation of the degree of approximation between 
the patterns by using the difference and accumulation of pixels in the respective M blocks which are selected out of the 
present input frame and the previous input frame (M is a positive integer). The number of pixels used for the calculation 

35 of the degree of approximation is K at the maximum (K is an integer greater than or equal to one and smaller than or 
equal to the number of a total number of pixels in one block). During the calculation of the degree of approximation at 
the time when the number of pixels in reference is less than K intermediate checks are performed four times, and an 
intermediate check point shall be provided in each 1/4 sample point. Fig. 25 shows examples of sample points used for 
distortion quantity operation. The mark O expresses a first time sample point for distortion quantity operation; the mark 

40 x expresses a second time sample point for distortion quantity operation; the mark a expresses a third time sample point 
for distortion quantity operation; the mark © expresses a fourth time sample point for, distortion quantity operation. 

In Fig. 24 when the total number of sample points is assumed to be K, express threshold levels at a first, second 
and third intermediate check points as d1\ d2' and d3'; then put 


45 d1' = D/4 + th1 (9-1) 

d2' = D/2+th2 (9-2) 
d3* = 3D/4 + th3 (9-3) 

50 

where th1 , th2, th3 can be set independently. Express the distortion quantity at the first, second and third intermediate 
points as di1 , di2 and di3. 

In this case, c01 expresses the value of the first time distortion quantity in Fig. 25; di2 expresses a cumulative value, 
di1 plus the second time distortion quantity; di3 expresses a cumulative value, di2 plus the third time distortion quantity. 
55 Therefore, the cumulative value in which the fourth time distortion quantity is accumulated becomes the distortion quantity 
in which all sample points are included. 

On the basis of a distortion quantity judgment at an intermediate check point if a block is estimated to have a large 
distortion quantity, the checking of the block is canceled before the block reaches the last check point to save useless 
operation processes. In other words if a distortion quantity di1 which is obtained by a distortion calculation at 1/4 K 
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sample point in a step ST41 is found to be di1 > d V by the judgment in the next step ST42, the block is canceled, if not. 
the operation is continued to the next step ST43 and the operation of distortion quantity di2 is performed with the distortion 
calculation at 1/2 K sample point If it is found that di2 > d2' in the judgment in the next step ST44 the block is canceled 
and if not. the operation is continued into the step ST45. Hie operation of distortion quantity di3 is performed with the 
calculation at 3/4 K sample point, and if it is found that di3 > d3 ( by the judgment in the next step ST46 this block is 
canceled, if not. the operation is continued into the step ST47 and the distortion quantity di at K sample point is calculated 
for performing comparison and renewal. 

As shown in the above if the processing is performed till the last step the same result is obtained as that obtained 
with the conventional method in which the whole pixels are used for a distortion operation. If the distortion quantity di, 
in this case, is smaller than the minimum distortion D, the value of the minimum distortion D is renewed for di and the 
motion vector index is renewed for the index i. The final minimum value of distortion D and the vector index I which shows 
the movement to give D can be obtained by repeating such operating processes as mentioned above by the number of 
times corresponding to the number of searching vectors till the process proceeds up to the Mth block. 

In the above embodiment, the example where differential absolute value sum is used for distortion quantity operation 
is shown, but differential square value sum can also be used. 

In the above embodiment, explanation is made about the case of motion compensative operation, but the execution 
of inner product vector quantization operation is also possible and the same effect can be obtained. When an operation 
result is compared with a threshold value at an Intermediate check point, the relation in magnitude is opposite to what 
is mentioned in the above embodiment. 

Claims 

1 . A motion compensative operation method wherein a current input frame of digital image data consisting of a plurality 
of frames entered successively is divided into a plurality of blocks, and, in detecting a block which provides a minimal 
25 distortion resulting from computation of resemblance between patterns based on the cumulation of differential abso- 
lute values or differential square values of pixels in block between a block of the current input frame and blocks of 
M in number (M is a positive integer) of the previous frame, a maximum of K (K is an integer greater than or equal 
to one and smaller than or equal to the number of a total number of pixels in one block) pixels are used for the 
pattern resemblance computation, intermediate checks are conducted n times (n is an integer greater than or equal 
30 to one) during the resemblance computation at time points when the number of reference pixels is smaller than K. 
a block is determined to be outside the range of comparison for finding a minimal distortion block if a cumulative 
value at each time point is greater than a threshold value preset for each intermediate check time point, and a block 
having a minimal distortion is detected by computation from among blocks whose cumulative values are below the 
threshold values at all time points of intermediate checks. 

35 


40 


45 


50 


55 


BNSDOCID: <EP 069O376A2 I > 


19 


EP 0 690 376 A2 


FIG 1 



or 


Ui 


.NSF 

LER 

<t 

-J 

cc 

o 

1— 

cc 



§ 



o 

Q 

o 


FIG 


( 3 ) DIVISION OF MEMORY AREA 



( DISTRIBUTION OF NUMBERS OF 
v w ' VALID PIXELS IN EACH AREA 


M 


i 

LJ 
>< 

CL 


< 


EA 


EB 


EC 


AREA 


20 


BNSDOCID: <EP 0690378A2 I > 


EP 0 690 376 A2 



21 


BNSDOCJD: <£P 0690376A2 I > 


EP 0 690 376 A2 



BNSDOCID: <EP 0690376A2 I > 


22 


EP 0 690 376 A2 


CD 
CNI 



23 


BNSDOC1D: <£P 0690376A2 I > 


BP 0 690 376 A2 


FIG. 6 


ADDRESS 
GENERATOR 


32 


2P- RAM 


34- 
36- 


SELECTOR 


-31 


33 


SELECTOR 


REGISTER 


REGISTER 


-35 

.37 


MULTIPLIER 


7" 

38 


REGISTER 


40- 


.39 


selector] 

1 


SELECTOR 


-41 


ARITHMETIC/ 
LOGIC UNIT 


44 


42 


SELECTOR 


~43 


ACC 0 


ACC 1 


ACC 2 


ACC 3 


45 


in 
in 
tu 

go: 

Q UJ 

< ^ 
-J — • 

< o 

Z UJ 

tu 


< o: 
Q UJ 

-J to 

cruj 
uj o: 
*- 


J 

46 


47 
( 


>- 
tr 
o 

Ul 


< 


CC 
UJ 


BNSDOCID: <EP 0690376A2 I > 


24 


EP 0 690 376 A2 


FIG. 7 


C 


START 


ST1 


i 

INITIALIZE 
ADDRESS 


srz x 


PARALLEL 
OPERATION 


PARALLEL OPERATION 
READ TWO PIECES OF 
DATA (INCRMENT AD- 
DRESS ) 

COMPUTE 

TRANSFER RESULT 
(TO EXTERNAL MEM- 
ORY) 


ST3- 


ST4- 


ST5 

r 


PARALLEL 
OPERATION 




INITIALIZE 
ADDRESS 


f 

READ EXTERNAL DATA 
USELESSLY 




PARALLEL OPERATION 
READ TWO DATA 
(INCREMENT ADDRESS) 
COMPUTE 

TRANSFER. RESULT 
(TO INTERNAL MEM- 
ORY ) 


c 


END 


(2 CYCLES) 


ai© bi (N + 3 CYCLES) OR 
ai x bi (N+4 CYCLES) 


(2 CYCLES) 


( 2 CYCLES ) 


(ai© bi )x CI (N+3 CYCLES) OR 
(ai x bi )e c i (N+2 CYCLES ) 


TOTAL OF 2N+10 CYCLES 


BNSOOCiD: <EP 0690376A2 I > 


25 


EP 0 690 376 A2 


51 


«5> 


55 


FIG. 8 


CODER 


56 


7 


57- 


DECODER 


52 


53 


MOTION 
COMPENSATOR 


4) 


58 

A 


FRAME 
MEMORY 


i 


54 


12. 

^ J. - 


FIG. 14 

~~i I 


i 


VALID/ 
INVALID 
JUDGMENT 


CODER 


1 I ' 


TASK 
CONTROLLER 


8 


7 


13: 


DECODER 


^9 


FRAME 
MEMORY 


15 


7 



FILTER 


5 


I 


BNSDOC1D: <EP 0690376A2 I > 


26 


EP 0 690 376 A2 



BNSDOCID: <EP 0690376A2 I > 


27 


EP 0 690 376 A2 


FIG.IO 


Q START ^ 


REPEAT AS MANY TIMES 

AS THE NUMBER 

OF SEARCH VECTORS 


K SAMPLE 

COMPUTE 
DISTORTIC 

f 

POINTS 
>N : di 


f 


ST 11 


COMPARISON • 

RENEWAL 

IF di < 

D THEN 

D = 

= di 

AND I = 

: i 


ST 12 


Q END ^ 


TASK 
CONTROLLER 


8 


2_ 


INPUT 
MEMORY 
2 


<r ~l 

9 


D SP 
2 


FIG. 11 



10 

f ; 

> s> 


INPUT 


DSP 

MEMORY 


> — > 

1 

1 





9 





8 




10 
( 



INPUT 


DSP 

MEMORY 


> 

3 

3 

V 




28 


BNSOOCID: <EP 0690376A2 I > 


EP 0 690 376 A2 


FiG.1 2 


ca ) 

DIVIDED PROCESSING 
AREA FOR EACH DSP 


(b) 



M 


DISTRIBUTION OF YALID 
PIXELS FOR EACH DSP 


EA' 


E EJ- 


ECT 


FIG. 13 


TASK 
CONTROLLER 


1 

Z 


DSP 
BLOCK 

1 


A 2 


8 


I 3 


DSP 
BLOCK 


FRAME 
MEMORY 


15 


14 


DSP 
BLOCK 

'3 " 


5 


BNSOOCfD: <£P 0690376A2 I > 


29 


EP 0 690 376 A2 



30 


BNSDOC1D: <EP 0690376A2 I > 


EP 0 690 376 A2 



BNSDOC1D: <EP 0690376A2 I > 


31 


EP 0 690 376 A2 



BNSDOCID: <EP 0690376A2 I > 


EP 0 690 376 A2 


O 
ID 
CO 


CO 

co 


CM 
CO 
CO 



CM 
II 

z 

O 

< 

r L 

CD 

CL 

Q_ 


t 

I 
I 

I 


CM 

in 

LU 

o 
o 

o ^ u. 

< £ =i 
I— > ZD 

< OO) 
QXUJ 


CO 
CO 

co 


(0 
IL 


O 
CO 
CO 


CO 
CO 

1 


CO 
CO 

1 




LG 


ll 

< 

z 

Ll 

03 

CL 



o 

CO 
CO 

A. 


CO 
CO 

1 


CM 
CO 
CO 

1 


Q 

o 
II 

LG 

< 


li. 

CO 

Q. 

CL 



CO 
CO 
CO 


CO 
CO 
CO 

J- 


QC 
UJ 
h~ 
LU 

< 
cr 
< 

CL 


< 


BNSDOCID: <EP 0690376A2 I > 


33 


EP0690 376A2 



BNSDOC1D: <EP 0690376A2 I > 


34 


EP 0 690 376 A2 


(moz-oid 


(11)03 •oidj 



35 


BNSOOCID: <EP 0690376A2 I > 


EP 0 690 376 A2 


O 
CO 


O 

LL 

f 


o 
to 


o 


o 
o 


< 


< 
o 
o 


a. 


CM 

CO 


UI 
_J 
03 
—I J< 

3g 


3§£ 


CO 


CO 
CO 


in 


CO 


CO 
CO 
CO 


$ o 


00 11 



CM 
CO 


a: 
o 
to 

oo o 

i— QC 
Q tO CL 


o 

CO 


CO 


2 
< 
cr 
i 

CL 

CVI 


cr 


UJ 

< 
cr 


< 


2 u. 


7 A 


cn 

CO 


v 


CD 
CO 


-4- 
ro 


A 

oo 

CO 


V V V 


-co 

CO 


S — — 7 


> 


CO 
CO 


it 


CO 

1/ 


CO 


CO 


N0UD3S lOdlNOD 
^ Sna ~1VN«3±X3 
m : 

00 


CO 
CO 


I 


o 


o 

Ol 


36 

BNSOOCID; <EP 0690376A2 I > 


EP 0 690 376 A2 


FIG. 21 


420 


424 


ADDRESS 
GENERATOR 


DATA 

MEMORY 


425 


421 


ADDRESS 
GENERATOR 


DATA 

MEMORY 


426 


L 


ADDRESS 
GENERATOR 


DATA 
MEMORY 


X-BUS t 
Y- BUS [ 
Z- BUS c 


438 


439 



442- SELECTOR 


W.BUS c 


445 


444^ 


ACCUMU- 
LATOR 

4, 


423 

i_ 


tr 
o 

tr>< 
lu ac 
crtu 
oz 
otu 
<o 


BNSOOaO: <EP 0690376A2 I > 


37 


EP 0 690 376 A2 


FIG. 22 


( START ) 


ST 31 


v 

f 



INITIAL SETTING 
OF ADDRESS 

( A CYCLES ) 

ST32^ 

f 


PARALLEL , 
OPERATION "< 


THREE DATA READOUT 
(ADDRESS INCREMENT) 
ARITHMETICAL 
OPERATION 
MULTIPLICATION 
OUTPUT TRANSFER 

( N + 3 CYCLES ) 




f 



( END ) TOTAL : N + 7 

CYCLES 


FIG. 24 



4k Hk k 


NUMBER OF SAMPLES 


38 


BNSDOCID: <EP 0690376A2 I > 


EP 0 690 376 A2 


o 
-§ 

O UJ 


X £ 
HqZ 

H- {/S X 
< UJ O 

uj cc a: 
a. oc < 

UJ o UJ 

a: o to 


c 


START 


v/ 1 


3 


1 


ST41 


DISTORTION QUANTITY CALCULATION 
AT 1/4K SAMPLE POINT : di1 


ST42 



YES 


ST43 


DISTORTION QUANTITY CALCULATION 
AT 1/2K SAMPLE POINT : di2 



YES 


ST45 


DISTORTION QUANTITY CALCULATION 
AT 3/AK SAMPLE POINT : di3 


ST46 



YES 


STA7 


DISTORTION QUANTITY CALCULATION 
AT K SAMPLE POINT : di 

COMPARISON. RENEWAL 


39 

BNSDOaD: <EP 0690376A2 t > 


EP 0 690 376 A2 


FIG. 25 


o 

A 

o 

A 

o 

A 

o 

A 

© 

X 

© 

X 

© 

X 

© 

X 

o 

A 

o 

A 

o 

A 

o 

A 

© 

X 

© 

X 

© 

X 

© 

X 

o 

A 

o 

A 

o 

A 

o 

A 

© 

X 

© 

X 

© 

X 

© 

X 

o 

A 

o 

A 

o 

A 

o 

A 

© 

X 

© 

X 

© 

X 

© 

X 


BNSDOCID: <EP 0690376A2 t > 


40 


